Mining top-k strongly correlated item pairs without minimum correlation threshold

نویسندگان

  • Zengyou He
  • Xiaofei Xu
  • Shengchun Deng
چکیده

Given a user-specified minimum correlation threshold and a transaction database, the problem of mining strongly correlated item pairs is to find all item pairs with Pearson's correlation coefficients above the threshold. However, setting such a threshold is by no means an easy task. In this paper, we consider a more practical problem: mining top-k strongly correlated item pairs, where k is the desired number of item pairs that have largest correlation values. Based on the FP-tree data structure, we propose an efficient algorithm, called Tkcp, for mining such patterns without minimum correlation threshold. Our experimental results show that Tkcp algorithm outperforms the Taper algorithm, one efficient algorithm for mining correlated item pairs, even with the assumption of an optimally chosen correlation threshold. Thus, we conclude that mining top-k strongly correlated pairs without minimum correlation threshold is more preferable than the original correlation threshold based mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Top-k Correlation Computation

Recently, there has been considerable interest in efficiently computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end,...

متن کامل

A FP-Tree Based Approach for Mining All Strongly Correlated Pairs without Candidate Generation

Given a user-specified minimum correlation threshold and a transaction database, the problem of mining all-strong correlated pairs is to find all item pairs with Pearson's correlation coefficients above the threshold . Despite the use of upper bound based pruning technique in the Taper algorithm [1], when the number of items and transactions are very large, candidate pair generation and test is...

متن کامل

Extracting Support Based k most Strongly Correlated Item Pairs in Large Transaction Databases

Support confidence framework is misleading in finding statistically meaningful relationships in market basket data. The alternative is to find strongly correlated item pairs from the basket data. However, strongly correlated pairs query suffered from suitable threshold setting problem. To overcome that, top-k pairs finding problem has been introduced. Most of the existing techniques are multi-p...

متن کامل

Scaling up top-K cosine similarity search

Article history: Received 21 September 2009 Received in revised form 23 August 2010 Accepted 23 August 2010 Available online 8 September 2010 Recent years have witnessed an increased interest in computing cosine similarity in many application domains. Most previous studies require the specification of a minimum similarity threshold to perform the cosine similarity computation. However, it is us...

متن کامل

Efficient Ming of Top-K Closed Sequences

Sequence mining is an important data mining task. In order to retrieve interesting sequences from a large database, a minimum support threshold is needed to be specified. Unfortunately, specification of the appropriated support threshold is very difficult for users who are novice to mining queries and task specific data. To avoid this difficulty of specification of the appropriated support thre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • KES Journal

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2006